Storing and Processing Temporal Data in Main Memory Column Stores

ثبت نشده
چکیده

Managing and accessing temporal data is of increasing importance for many applications in industry. Yet, even though there was a significant amount of research in academia during the 1990’s, temporal features were only recently included into the SQL:2011 standard. Therefore, only a few temporal operators and with rather poor performance are currently implemented by commercial database systems. As several important use cases are currently not covered adequately by database systems, many developers model the time dimension on the application layer, rather than pushing down the operators to the database. The implementation of temporal features on the application layer leads to considerable performance overhead. The goal of this dissertation is to develop native support of temporal features for SAP HANA, a commercial in-memory column store database system. As no standard benchmark for temporal databases is available, we propose a new benchmark (TPC-BiH) which allows us to evaluate the performance of both commercial database systems and our own implementations. We investigate different alternatives to store temporal data physically in main memory and analyze the trade-offs arising from different memory layouts that cluster the data either by time or by space dimension. Taking into account the underlying physical representation, different temporal operators such as temporal aggregation, timeslice and temporal join have to be executed efficiently. We present a novel data structure called Timeline Index and algorithms based on this index, which have very competitive performance for all temporal operators. These algorithms beat existing best-of-breed approaches for each operator – in some cases by several orders of magnitude. While analysing the requirements with clients, it appeared that many applications include more than one time dimension. User-defined time domains, such as the validity of a contract or the availability of a product, are modeled as an applicationtime domain, whereas the period when a fact was visible in the database is represented by the system-time. For this reason we provide a bitemporal extension of the Timeline Index for bitemporal data. The Timeline Index is currently being integrated into SAP HANA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling out Column Stores: Data, Queries, and Transactions Scaling out Column Stores: Data, Queries, and Transactions

The amount of data available today is huge and keeps increasing steadily. Databases help to cope with huge amounts of data. Yet, traditional databases are not fast enough to answer the complex analytical queries that decision makers in big enterprises ask over large datasets. This is where column stores have their field of application. Tailored to this type of on-line analytical processing (OLA...

متن کامل

Column Stores for Wide and Sparse Data

While it is generally accepted that data warehouses and OLAP workloads are excellent applications for column-stores, this paper speculates that column-stores may well be suited for additional applications. In particular we observe that column-stores do not see a performance degradation when storing extremely wide tables, and column-stores handle sparse data very well. These two properties lead ...

متن کامل

Web Mining Accelerated with In-Memory and Column Store Technology

Current web mining approaches use massive amounts of commodity hardware and processing time to leverage analytics for today’s web. For a seamless application interaction, those approaches have to use pre-aggregated results and indexes to circumvent the slow processing on their data stores e.g. relational databases or document stores. The upcoming trend of in-memory, column-oriented databases is...

متن کامل

Adaptive NUMA-aware data placement and task scheduling for analytical workloads in main-memory column-stores

Non-uniform memory access (NUMA) architectures pose numerous performance challenges for main-memory column-stores in scaling up analytics on modern multi-socket multi-core servers. A NUMAaware execution engine needs a strategy for data placement and task scheduling that prefers fast local memory accesses over remote memory accesses, and avoids an imbalance of resource utilization, both CPU and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014